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Abstract. In previous work, we presented a novel information-theoretic 
privacy criterion for query forgery in the domain of information retrieval. 
Our criterion measured privacy risk as a divergence between the user's 
and the population's query distribution, and contemplated the entropy 
of the user's distribution as a particular case. In this work, we make a 
twofold contribution. First, we thoroughly interpret and justify the pri- 
vacy metric proposed in our previous work, elaborating on the intimate 
connection between the celebrated method of entropy maximization and 
the use of entropies and divergences as measures of privacy. Secondly, 
we attempt to bridge the gap between the privacy and the information- 
theoretic communities by substantially adapting some technicalities of 
our original work to reach a wider audience, not intimately familiar with 
information theory and the method of types. 

1 Introduction 

During the last two decades, the Internet has graduaUy become a part of 
everyday hfe. One of the most frequent activities when users browse the 
Web is submitting a query to a search engine. Search engines aUow users 
to retrieve information on a great variety of categories, such as hobbies, 
sports, business or health. However, most of them are unaware of the 
privacy risks they are exposed to jlj. 

As a concrete example, from November to December of 2008, 61% 
of adults in the U.S. looked for online information about a particular 
disease, a specific treatment, an alternative medicine, and other related 
topics [2j. Such queries could disclose sensitive information and be used 
to profile users about potential diseases. In the wrong hands, such private 
information could be the cause of discriminatory hiring, or could seriously 
damage someone's reputation. 

* This work was partly supported by the Spanish Government through projects 
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The fact is that the hterature of information retrieval abounds with 
examples of user privacy threats. Those include the risk of user profiling 
not only by an Internet search engine, but also by location-based service 
(LBS) providers, or even corporate profiling by patent and stock market 
database providers. In this context, query forgery, which consists in ac- 
companying genuine with forged queries, appears as an approach, among 
many others, to preserve user privacy to a certain extent, if one is willing 
to pay the cost of traffic and processing overhead. 

In our previous work [s], we presented a novel information-theoretic 
privacy criterion for query forgery in the domain of information retrieval. 
Our criterion measured privacy risk as a divergence between the user's 
and the population's query distribution, and contemplated the entropy 
of the user's distribution as a particular case. In this work, we make a 
twofold contribution. First, we thoroughly interpret and justify the pri- 
vacy metric proposed in our previous work, elaborating on the intimate 
connection between the celebrated method of entropy maximization and 
the use of entropies and divergences as measures of privacy. Secondly, 
we attempt to bridge the gap between the privacy and the information- 
theoretic communities by substantially adapting some technicalities of 
our original work to reach a wider audience, not intimately familiar with 
information theory and the method of types. 

Sec. [2] reviews the most relevant approaches in private information 
retrieval and privacy criteria. Sec. [3] examines some fundamental concepts 
related to information theory which will help to better understand the 
essence of this work. Inspired by the maximum entropy method, we put 
forth an information-theoretic criterion to measure the privacy of user 
profiles in Sec. [4} Sec. [5] applies this criterion to the optimization of the 
trade-off between privacy and redundancy for query forgery in private 
information retrieval. Conclusions are drawn in Sec. [H 

2 State of the Art in Private Information Retrieval 

Throughout this paper, we shall use the term private information re- 
trieval (PIR) in its widest sense, meaning that we shall not restrict our- 
selves to the cryptographically-based techniques normally connected to 
that acronym. In other words, we shall refer to the more generic scenario 
in which users send general-purpose queries to an information service 
provider, say Googling "highest-grossing film science fiction?". Next, we 
shall introduce the most relevant contributions to PIR with regard to 
query forgery and privacy criteria. 



2.1 Private Information Retrieval 



A variety of solutions have been proposed in information retrieval. Some of 
them are based on a trusted third party (TTP) acting as an intermediary 
between users and the information service provider Although this 
approach guarantees user privacy thanks to the fact that their identity 
is unknown to the service provider, in the end, user trust is just shifted 
from one entity to another. 

Some proposals not relying on TTPs make use of perturbation tech- 
niques. In the particular case of LBS, users may perturb their location 
information when querying a service provider [s] . This provides users with 
a certain level of privacy in terms of location, but clearly not in terms 
of query contents and activity. Further, this technique poses a trade-off 
between privacy and data utility: the higher the perturbation of the lo- 
cation, the higher the user's privacy, but the lower the accuracy of the 
service provider's responses. Other TTP-free techniques rely on user col- 
laboration. In ^6j|7j, a protocol based on query permutation in a trellis of 
users is proposed, which comes in handy when neither the service provider 
nor other cooperating users can be completely trusted. 

Alternatively, cryptographic methods for PIR enable a user to pri- 
vately retrieve the contents from a database indexed by a memory ad- 
dress sent by the user, making it unfeasible for the database provider to 
ascertain which entries were retrieved jsjjo] . Unfortunately, these methods 
require the provider's cooperation in the privacy protocol, are limited to 
a certain extent to query-response functions in the form of a finite lookup 
table of precomputed answers, and are burdened with a significant com- 
putational overhead. 

Query forgery, the focus of our discussion, stands as yet another alter- 
native to the previous methods. The idea behind this technique is simply 
to submit original queries along with false queries. Despite its plainness, 
this approach can protect user privacy to a certain extent, at the cost of 
traffic and processing overhead, but without the need to trust the infor- 
mation provider or the network operator. Building upon this principle. 



several PIR protocols, mainly heuristic, have been put forth. In [10 11 
a solution is presented, aimed to preserve the privacy of a group of users 
sharing an access point to the Web while surfing the Internet. The au- 
thors propose the generation of fake accesses to a Web page to hinder 
eavesdroppers in their efforts to profile the group. Privacy is measured 
as the similarity between the actual profile of a group of users and that 
observed by privacy attackers (lOl . Specifically, the authors use the cosine 



measure, common in information retrieval 112 , to capture the similarity 



between the group's genuine distribution and the apparent one. Based 
on this model, some experiments are conducted to study the impact of 



the construction of user profiles on the performance 13 . In line with 
this, simple, heuristic implementations in the form of add-ons for popular 
browsers have recently appeared [l4|[T5] . 

Query forgery is also present as a component of other privacy pro- 
tocols, such as the private location-based information retrieval protocol 
via user collaboration in (6|[7]. In [16], the authors propose submitting 
true and false position data when querying an LBS provider, maintaining 
certain temporal consistency, rather than doing so completely randomly. 

In addition to legal implications, there are a number of technical con- 
siderations regarding bogus traffic generation for privacy |17|, as attackers 
may analyze not only contents but also activity, timing, routing or any 
transmission protocol parameters, jointly across several queries or even 
across diverse information services. Furthermore, automated query gen- 
eration is naturally bound to be frown upon by network and information 
providers, thus any practical framework must take traffic overhead into 
account. 

2.2 Privacy Criteria 

In this section we give a broad overview of privacy criteria originally 
intended for statistical disclosure control (SDC), but in fact applicable to 
query logs in PIR, the motivating application of our work. In database 
privacy, a microdata set is defined as a database table whose records 
carry information concerning individual respondents. Specifically, this set 
contains key attributes, that is, attributes that, in combination, may be 
linked with external information to reidentify the respondents to whom 
the records in the microdata set refer. Examples include job, address, 
age and gender, height and weight. In addition, the data set contains 
confidential attributes with sensitive information on the respondent, such 
as health, salary and religion. 

A common approach in SDC is microaggregation, which consists in 
clustering the data set into groups of records with similar tuples of key 
attributes values, and replacing these tuples in every record within each 
group by a representative group tuple. One of the most popular privacy 
criteria in database anonymization is A:-anonymity ^18j, which can be 
achieved through the aforementioned microaggregation procedure. This 
criterion requires that each combination of key attribute values be shared 
by at least k records in the microdata set. However, the problem of 



fc-anonymity, and of enhancements [19-22 such as /-diversity, is their 



vulnerability against skewness and similarity attacks [23] . In order to 
overcome these deficiencies, yet another privacy criterion was considered 
in [24]: a dataset is said to satisfy f-closeness if for each group of records 
sharing a combination of key attributes, a certain measure of divergence 
between the within-group distribution of confidential attributes and the 
distribution of those attributes for the entire dataset does not exceed a 
threshold t. An average-case version of the worst-case t-closeness crite- 
rion, using the Kullback-Leibler divergence as a measure of discrepancy, 
turns out to be equivalent to a mutual information, and lend itself to a 



generalization of Shannon's rate-distortion problem 125 , 26 



A simpler information-theoretic privacy criterion, not directly evolved 
from /c-anonymity, consists in measuring the degree of anonymity observ- 
able by an attacker as the entropy of the probability distribution of possi- 
ble senders of a given message 27|28 . A generalization and justification of 
such criterion, along with its applicability to PIR, are provided in [3 ,29 



3 Statistical and Information-Theoretic Preliminaries 



This section establishes notational aspects, and, in order to make our 
presentation suited to a wider audience, recalls key information-theoretic 
concepts assumed to be known in the remainder of the paper. The mea- 
surable space in which a random variable (r.v.) takes on values will be 
called an alphabet, which, with a mild loss of generality, we shall always 
assume to be finite. We shall follow the convention of using uppercase 
letters for r.v.'s, and lowercase letters for particular values they take on. 
The probability mass function (PMFs) p of an r.v. X is essentially a rel- 
ative histogram across the possible values determined by its alphabet. 
Informally, we shall occasionally refer to the function p by its value p(x). 
The expectation of an r.v. X will be written as EX, concisely denoting 

xp{x), where the sum is taken across all values of x in its alphabet. 

We adopt the same notation for information-theoretic quantities used 
in [30] . Concordantly, the symbol H will denote entropy and D relative 
entropy or Kullback-Leibler (KL) divergence. We briefly recall those con- 
cepts for the reader not intimately familiar with information theory. All 
logarithms are taken to base 2. The entropy H(p) of a discrete r.v. X with 
probability distribution p is a measure of its uncertainty, defined as 



H(X) = -E logp(X) = -^p{x)logp{x). 

X 



Given two probability distributions p{x) and q{x) over the same alphabet, 
the KL divergence or relative entropy D(p || g) is defined as 



The KL divergence is often referred to as relative entropy, as it may 
be regarded as a generalization of entropy of a distribution, relative to 
another. Conversely, entropy is a special case of KL divergence, as for a 
uniform distribution ti on a finite alphabet of cardinality n, 

D(p ||n) = logn — H(p). (1) 

Although the KL divergence is not a distance in the mathematical 
sense of the term, because it is neither symmetric nor satisfies the trian- 
gle inequality, it does provide a measure of discrepancy between distribu- 
tions, in the sense that D(p || q) > 0, with equality if, and only if, p = q. 
On account of this fact, relation ([T]) between entropy and KL divergence 
implies that H(p) ^ logn, with equality if, and only if, p = u. Simply 
put, entropy maximization is a special case of divergence minimization, 
attained when the distribution taken as optimization variable is identical 
to the reference distribution, or as "close" as possible, should the opti- 
mization problem appear accompanied with constraints on the desired 
space of candidate distributions. 



4 Entropy and Divergence as Measures of Privacy 

In this paper we shall interpret entropy and KL divergence as privacy 
criteria. For that purpose, we shall adopt the perspective of Jaynes' cel- 
ebrated rationale on entropy maximization methods ^31j, which builds 



upon the method of types 30, §11], a powerful technique in large devia- 
tion theory whose fundamental results we proceed to review. 

The first part of this section will tackle an important question. Sup- 
pose we are faced with a problem, formulated in terms of a model, in which 
a probability distribution plays a major role. In the event this distribution 
is unknown, we wish to assume a feasible candidate. What is the most 
likely probability distribution? In other words, what is the "probability 
of a probability" distribution? We shall see that a widespread answer to 
this question relies on choosing the distribution maximizing the Shan- 
non entropy, or, if a reference distribution is available, the distribution 
minimizing the KL divergence with respect to it, commonly subject to 
feasibility constraints determined by the specific application at hand. 



Our review of the maximum entropy method is crucial because it is 
unfortunately not always known in the privacy community, and because 
the rest of this paper constitutes a sophisticated illustration of its appli- 
cation, in the context of the protection of the privacy of user profiles. As 
we shall see in the second part of this section, the key idea is to model a 
user profile as a histogram of relative frequencies across categories of in- 
terest, regard it as a probability distribution, apply the maximum entropy 
method to measure the likelihood of a user profile either as its entropy 
or as its divergence with respect to the population's average profile, and 
finally take that likelihood as a measure of anonymity. 



4.1 Rationale behind the Maximum Entropy Method 

A wide variety of models across diverse fields have been explained on the 
basis of the intriguing principle of entropy maximization. A classical ex- 
ample in physics is the Maxwell-Boltzmann probability distribution p{v) 
of particle velocities y in a gas [32[|33| of known temperature. It turns out 
that p{v) is precisely the probability distribution maximizing the entropy, 
subject to a constraint on the temperature, equivalent to a constraint on 
the average kinetic energy, in turn equivalent to a constraint on EV'^. 
Another well-known example, in the field of electrical engineering, of the 
application of the maximum entropy method, is Burg's spectral estima- 
tion method |34|. In this method, the power spectral density of a signal 
is regarded as a probability distribution of power across frequency, only 
partly known. Burg suggested filling in the unknown portion of the power 
spectral density by choosing that maximizing the entropy, constrained on 
the partial knowledge available. More concretely, in discrete case, when 
the constraints consist in a given range of the crosscorrelation function, 
up to a time shift k, the solution turns out to be a k^^ order Gauss- 
Markov process p30|. A third and more recent example, this time in the 
field of natural language processing, is the use of log-linear models, which 
arise as the solution to constrained maximum entropy problems in 
computational linguistics. 

Having motivated the maximum entropy method, we are ready to 
proceed to describe Jaynes' attempt to justify, or at least interpret it, 
by reviewing the method of types of large deviation theory, a beautiful 
area lying at the intersection of statistics and information theory. Let 
Xi, . . . ,Xk be a sequence of k i.i.d. drawings of an r.v. uniformly dis- 
tributed in the alphabet {1, . . . , n}. Let ki be the number of times symbol 
i = 1, . . . , n appears in a sequence of outcomes xi, . . . ,Xk, thus k = k^. 



The type t of a sequence of outcomes is the relative proportion of occur- 



rences of each symbol, that is, the empirical distribution t ■ 
not necessarily uniform. In other words, consider tossing an n-sided fair 



^1 

fc ' • • • ' k 



dice k times, and seeing exactly ki times face i. In 31 , Jaynes points out 
that 

Loosely speaking, for large k, the size of a type class, that is, the num- 
ber of possible outcomes for a given type t (permutations with repeated 
elements), is approximately 2*^^^*^ in the exponent. The fundamental ra- 
tionale in |31| for selecting the type t with maximum entropy H(t) lies in 
the approximate equivalence between entropy maximization and the max- 
imization of the number of possible outcomes corresponding to a type. In 
a way, this justifies the infamous principle of insufficient reason, accord- 
ing to which, one may expect an approximately equal relative frequency 
ki/k = 1/n for each symbol i, as the uniform distribution maximizes the 
entropy. The principle of entropy maximization is extended to include 
constraints also in [31] . 

Obviously, since all possible permutations count equally, the argument 
only works for uniformly distributed drawings, which is somewhat circu- 
lar. A more general argument [30| §11], albeit entirely analogous, departs 
from a prior knowledge of an arbitrary PMF t, not necessarily uniform, of 
such samples Xi, . . . , ■ Because the empirical distribution or type T of 
an i.i.d. drawing is itself an r.v., we may define its PMF p{t) = P{T = t}; 
formally, the PMF of a random PMF. Using indicator r.v.'s, it is straight- 
forward to confirm the intuition that FiT = t. The general argument in 
question leads to approximating the probability p{t) of a type class, a 
fractional measure of its size, in terms of its relative entropy, specifically 
2-kD(t\\t) ^j^g exponent, i.e., 

D(t II ^ -T logp(t) for A; > 1, 
k 

which encompasses the special case of entropy, by virtue of ([l]) . Roughly 
speaking, the likelihood of the empirical distribution t exponentially de- 
creases with its KL divergence with respect to the average, reference dis- 
tribution i. 

In conclusion, the most likely PMF t is that minimizing its divergence 
with respect to the reference distribution t. In the special case of uniform 
i = u, this is equivalent to maximizing the entropy, possibly subject to 
constraints on t that refiect its partial knowledge or a restricted set of 



feasible choices. The apphcation of this idea to the estabhshment of a 
privacy criterion is the object of the remainder of this work. 



4.2 Measuring the Privacy of User Profiles 

We are finahy equipped to justify, or at least interpret, our proposal to 
adopt Shannon's entropy and KL divergence as measures of the privacy 
of a user profile. Before we dive in, we must stress that the use of entropy 
as a measure of privacy, in the widest sense of the term, is by no means 
new. Shannon's work in the fifties introduced the concept of equivocation 
as the conditional entropy of a private message given an observed cryp- 
togram [36], later used in the formulation of the problem of the wiretap 



channel 137 38 as a measure of confidentiality. More recent studies 127028 



rescue the suitable applicability of the concept of entropy as a measure 
of privacy, by proposing to measure the degree of anonymity observable 
by an attacker as the entropy of the probability distribution of possible 
senders of a given message. More recent work has taken initial steps in 
relating privacy to information-theoretic quantities [3 , 24 - 26] . 



In the context of this paper, an intuitive justification in favor of en- 
tropy maximization is that it boils down to making the apparent user pro- 
file as uniform as possible, thereby hiding a user's particular bias towards 
certain categories of interest. But a much richer argumentation stems 



from Jaynes' rationale behind entropy maximization methods 31,39 
more generally understood under the beautiful perspective of the method 
of types and large deviation theory l30[ §11], which we motivated and 
reviewed in the previous subsection. 

Under Jaynes' rationale on entropy maximization methods, the en- 
tropy of an apparent user profile, modeled by a relative frequency his- 
togram of categorized queries, may be regarded as a measure of privacy, or 
perhaps more accurately, anonymity. The leading idea is that the method 
of types from information theory establishes an approximate monotonic 
relationship between the likelihood of a PMF in a stochastic system and 
its entropy. Loosely speaking and in our context, the higher the entropy 
of a profile, the more likely it is, and the more users behave according to 
it. This is of course in the absence of a probability distribution model for 
the PMFs, viewed abstractly as r.v.'s themselves. Under this interpreta- 
tion, entropy is a measure of anonymity, not in the sense that the user's 
identity remains unknown, but only in the sense that higher likelihood 
of an apparent profile, believed by an external observer to be the ac- 
tual profile, makes that profile more common, hopefully helping the user 



go unnoticed, less interesting to an attacker assumed to strive to target 
peculiar users. 

If an aggregated histogram of the population were available as a ref- 
erence profile, the extension of Jaynes' argument to relative entropy, that 
is, to the KL divergence, would also give an acceptable measure of privacy 
(or anonymity). Recall from Sec. [s] that KL divergence is a measure of 
discrepancy between probability distributions, which includes Shannon's 
entropy as the special case when the reference distribution is uniform. 
Conceptually, a lower KL divergence hides discrepancies with respect to a 
reference profile, say the population's, and there also exists a monotonic 
relationship between the likelihood of a distribution and its divergence 
with respect to the reference distribution of choice, which enables us to 
regard KL divergence as a measure of anonymity in a sense entirely anal- 
ogous to the above mentioned. In fact, KL divergence was used recently 
m our own work (3|[29] as a generalization of entropy to measure privacy, 
although the justification used built upon a number of technicalities, and 
the connection to Jaynes' rationale was not nearly as detailed as in this 
manuscript. 



5 Application of our Privacy Criterion to Query Forgery 

This section applies the information-theoretic privacy criterion proposed 
m Sec. ID to query forgery in private information retrieval. More specifi- 
cally, Sec.|5.1|establishes a privacy measure in accordance to our criterion. 



which leads to the optimization problem shown in Sec. 5.2, representing 
the compromise between privacy risk the redundancy introduced by bo- 
gus queries. This section has been adapted from our recent work on query 
forgery j3j, to illustrate the criterion carefully detailed in this manuscript, 
and to reach a wider audience than that intended in our original, densely 
mathematical work. 



5.1 Measuring the Privacy Gained by Forging Queries 

Our mathematical model represents user queries as r.v.'s, which take on 
values in a common, finite alphabet. Preliminarily, we simply model user 
queries as r.v.'s in a rather small set of categories, topics or keywords, 
represented by {1, . . . , n}. User profiles are modeled as the corresponding 
PMFs. 

Bearing in mind these considerations, we shall define p as the dis- 
tribution of the population's queries, q as the distribution of legitimate 



queries of a particular user, and r as the distribution of queries forged by 
that user. In addition, we shall introduce a query redundancy parameter 
p G [0, 1), which will represent the ratio of forged queries to total queries. 
Concordantly, we shall define the user's apparent query distribution as 
the convex combination s = {1 — p) q + pr, which will actually be the dis- 
tribution the information service provider, or simply a privacy attacker, 
will observe. Fig. [T] depicts the intuition that an attacker will be able to 
compromise a user's anonymity if the user's apparent query distribution 
diverges too much from the population's. 




Fig. 1. A user accompanies original queries, submitted to an information 
service provider, with forged ones, in order to go unnoticed. 

Building upon the privacy criteria proposed in Sec. |4| we define the 
initial privacy risk as the KL divergence between the user's authentic 
profile and the population's distribution, that is, D((7||p). Similarly, we 
define the (final) privacy risk TZ as the KL divergence between the ap- 
parent distribution and the population's, that is, 

TZ = D{s\\p) = — p) q + p r \\ p) . 

We have mentioned that entropy maximization was the special case of 
divergence minimization when the reference distribution is uniform. In 
terms of this formulation, for a population profile p = u uniform across 
the n categories of interest, 

D(s II u) = logn — H(s), 

and, accordingly, we may regard H(s) as a measure of privacy gain, rather 
than risk. 



5.2 Optimizing Privacy Subject to a Forgery Constraint 



This section presents a formulation of the compromise between privacy 
and the redundant traffic due to query forgery, which arises from the 



privacy measure introduced in Sec. 5.1 Taking into account the definition 



of our privacy criterion, we shall suppose that the population is large 
enough to neglect the impact of the choice of r on p. Accordingly, we 
define the privacy-redundancy function 

TZ{p) = min D((l — p) q + pr\\p), 

r 

which poses the optimal trade-off between query privacy (risk) and redun- 
dancy. The minimization variable is the PMF r representing the optimum 
profile of forged queries, for a given redundancy p. 

There are two important advantages in modeling the privacy of a 
user profile as a divergence in general, or an entropy in particular, in 
this and other potential applications of our privacy criterion. First, the 
mathematical tractability demonstrated in [3]. Secondly, the privacy-re- 
dundancy function has been defined in terms of an optimization problem, 
whose objective function is convex, subject to an affine constraint. As 
a consequence, this problem belongs to the extensively studied class of 



convex optimization problems 40 and may be solved numerically, using 
a number of extremely efficient methods, such as interior-point methods. 

A dual version of this problem is that of tag suppression in the seman- 
tic web ^9], where entropy is used as a measure of privacy of user proffies, 
and users may choose to refrain from tagging certain resources regard- 
ing certain categories of interest. The privacy measure utilized may be 
more clearly justified, and immediately extensible to divergences, under 
the considerations described in this work. 



6 Conclusion 

There are a wide variety of proposals for the problem of PIR, considered 
here in the broadest sense of the term. Within those approaches, query 
forgery arises as a simple strategy in terms of infrastructure requirements, 
as users do not need an external entity to trust. However, this strategy 
poses a trade-off between privacy and the cost of traffic and processing 
overhead. 

In our previous work [s], we presented an information-theoretic pri- 
vacy criterion for query forgery in PIR, which arose from the formulation 
of the privacy-redundancy compromise. Inspired by the work in [26], the 



privacy risk was measured as the KL divergence between the user's appar- 
ent query distribution, containing dummy queries, and the population's. 
Our formulation contemplated, as a special case, the maximization of the 
entropy of the user's distribution. Preliminarily, we simply model user 
queries as r.v.'s in a rather small set of categories, topics or keywords, 
and user profiles as the corresponding PMFs. 

In this work, we make a twofold contribution. First, we thoroughly 
interpret and justify the privacy metric proposed in our previous work, 
elaborating on the intimate connection between the celebrated method of 
entropy maximization and the use of entropies and divergences as mea- 
sures of privacy. Measuring privacy enables us to optimize it, drawing 
upon powerful tools of convex optimization. The entropy maximization 
method is a beautiful principle amply exploited in fields such as physics, 
electrical engineering and even natural language processing. 

Secondly, we attempt to bridge the gap between the privacy and the 
information-theoretic communities by substantially adapting some tech- 
nicalities of our original work to reach a wider audience, not intimately 
familiar with information theory and the method of types. As neither 
information theory nor convex optimization are fully widespread in the 
privacy community, we elaborate and clarify the connection with privacy 
in far more detail, and hopefully in more accessible terms, than in our 
original work. 

Although our proposal arises from an information-theoretic quantity 
and it is mathematically tractable, the adequacy of our formulation relies 
on the appropriateness of the criteria optimized, which depends on several 
factors, such as the particular application at hand, the query statistics 
of the users, the actual network and processing overhead incurred by 
introducing forged queries, the adversarial model and the mechanisms 
against privacy contemplated. 
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